Search CORE

116 research outputs found

Unpredictability of AI

Author: Yampolskiy Roman
Publication venue
Publication date
Field of study

The young field of AI Safety is still in the process of identifying its challenges and limitations. In this paper, we formally describe one such impossibility result, namely Unpredictability of AI. We prove that it is impossible to precisely and consistently predict what specific actions a smarter-than-human intelligent system will take to achieve its objectives, even if we know terminal goals of the system. In conclusion, impact of Unpredictability on AI Safety is discussed

PhilPapers

Designometry – Formalization of Artifacts and Methods

Author: Yampolskiy Roman
Ziesche Soenke
Publication venue
Publication date
Field of study

Two interconnected surveys are presented, one of artifacts and one of designometry. Artifacts are objects, which have an originator and do not exist in nature. Designometry is a new field of study, which aims to identify the originators of artifacts. The space of artifacts is described and also domains, which pursue designometry, yet currently doing so without collaboration or common methodologies. On this basis, synergies as well as a generic axiom and heuristics for the quest of the creators of artifacts are introduced. While designometry has various areas of applications, the research of methods to detect originators of artificial minds, which constitute a subgroup of artifacts, can be seen as particularly relevant and, in the case of malevolent artificial minds, as contribution to AI safety

PhilPapers

Emergence of Addictive Behaviors in Reinforcement Learning Agents

Author: Behzadan Vahid
Munir Arslan
Yampolskiy Roman V.
Publication venue
Publication date: 13/11/2018
Field of study

This paper presents a novel approach to the technical analysis of wireheading in intelligent agents. Inspired by the natural analogues of wireheading and their prevalent manifestations, we propose the modeling of such phenomenon in Reinforcement Learning (RL) agents as psychological disorders. In a preliminary step towards evaluating this proposal, we study the feasibility and dynamics of emergent addictive policies in Q-learning agents in the tractable environment of the game of Snake. We consider a slightly modified settings for this game, in which the environment provides a "drug" seed alongside the original "healthy" seed for the consumption of the snake. We adopt and extend an RL-based model of natural addiction to Q-learning agents in this settings, and derive sufficient parametric conditions for the emergence of addictive behaviors in such agents. Furthermore, we evaluate our theoretical analysis with three sets of simulation-based experiments. The results demonstrate the feasibility of addictive wireheading in RL agents, and provide promising venues of further research on the psychopathological modeling of complex AI safety problems

arXiv.org e-Print Archive

University of Louisville

The AGI Containment Problem

Author: Babcock James
Kramar Janos
Yampolskiy Roman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/07/2016
Field of study

There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs to tamper with their test environments, make copies of themselves on the internet, or convince developers and operators to do dangerous things. In this paper, we survey the AGI containment problem - the question of how to build a container in which tests can be conducted safely and reliably, even on AGIs with unknown motivations and capabilities that could be dangerous. We identify requirements for AGI containers, available mechanisms, and weaknesses that need to be addressed

arXiv.org e-Print Archive

University of Louisville